Projects: TEEMSS2 : Distributed Object Store

This page last changed on Sep 08, 2004 by scytacki.

Task	Scott 50%	Planned Start	Planned Finish	Actual Start	Actual Finish
evaluate ozone	1 wk	Jul 5	Jul 9	Jul 15	Jul 20
Syncing Design	2 days	Jul 20	Jul 28	Jul 20
Syncing Milestones	4 wk	Jul 29	Aug 20
break while finishing vendor integration
Java 1.4 version 1	1 mo	Oct 11	Nov 5
Waba version 1	1 mo	Nov 5	Dec 3
Palm Syncing	1 mo	Dec 6	Jan 7 2005

(2 weeks more design) This doesn't need to be implemented until later summer, right? But the design will influence collaboration aspect of the activities, and the importing and exporting from each manufacturer. And it influences the portal design.

Collaboration?? what amount of collaboration do we want and can we support.

This page is under serious construction. Don't expect to be able to understand it easily. Basically it is a colleciton of ramblings on this part of the system. It hasn't materialized into a design yet. At this point I would say it needs a two weeks of full time work or 1 month of my 50% time to turn it into a more concrete design:

go over requirements.
verify that my simplification of some requirements is ok with the project group.
look at pluses and minus of different technolies we can use.
after picking some technologies make a plan for implementing it with them.

It could take much more than 5 days, but I think I could set that as a deadline for working on this and be happy with what I had after that amount of time.

This is the core technology of the whole project. I think this is what is most likely to live beyond the project and be useful to handheld probe and curriculm companies. If it is designed right it can separated from waba and java. One the palm it can just be a format for a palm database file, or pocketpc database file/program. On the desktop it will have a database format that could be used by java libraries, php libraries, or other native libraries.

On a palm a dynamic group must be setup at the beginning of class. This way data can be ascociated with each user.

It would also be ideal for every palm to have every users data. That way any student can pickup any palm. This would need to be limited somehow. At least on the level of the school. Perhaps at the level of the class. Or if that doesn't fit then the level of the student.

Also any dos objects that are saving user data need to support multiple users accessing those object. And each user or group needs their own view of those objects.

The main idea is to have a Class or Interface that objects can extend and then they can be saved in this system. When they are saved their code is saved too. And then when they are used their state and code are moved around to where ever they are need. The objects do not need to worry about these details of being moved around from here to there. And if they need another object it will be automatically found and made available. Also these objects have a formal version of users so permissions can be implemented. Finally they also have version control. This is an abstraction that I always find myself wanting.

It has its own synching mechanism. That makes the data available internet wide.

Things that I've found that are similar to this are DHTs or distributed hashtables. These is a recent movement that is documented a little on the tels site. Look under the distributed heading of this site: http://docs.telscenter.org/display/SAIL/Development+Technologies

I have not found an existing technology that does everything we need. So what do we need:

replication
conflict resolution
version
- management
- linking dependencies based on version
code stored with data
bandwidth efficient
easy to install
works on unsecure local storage
permissions
dependency graph that can be quick traversed
upload and download content
p2p sharing of data when server isn't available

Here is a matrix of these features and a few technologies:
http://concord.org/~scytacki/DistributeObjectStoreTechnologies.xls

it might be possible to divide this into parts.

!If we base it on a two way unsecure data-transfer and caching system like:
workspace.jar or subversion then

data-transfer and caching - data goes up and down but they are just blobs of binary data with addresses
- one problem at this level is version control. the address of objects need to include version info somehow so version deps can be handled correctly. subversion could do this, workspace.jar doesn't but could
dependency info - this can be just a data object that goes up and down with the other data objects. This is how we implemented it in the workspace code. Here is a discussion of how dependency info affects replication: http://docs.telscenter.org/display/SAIL/Content+Deployment
permissions: data-privacy, read-only, write-only - privacy can be achieved by encrypting the blobs. read-only can be achieved with a secure digital signature for the blob. This requires a set of secured trusted public keys (like ssl, and jar signing do it) But allowing sharing of data from peer to peer in a system like this is very complicated. Because noone is trusted so they all must establish keys so they can both change blobs without letting others change them.

Is there another option besides using that as a base? the main problem is dealing with the unsecure storage, and efficient dependency usage. We can do efficiet dependencies if the server tracks the clients and the clients track their updates. subversion does the second half of this. The first half might make p2p hard.

So essentially it seems like splitting it this way is ok. now we talk about the requirements for each part of the split.

binary transfer and caching

ramblings about caching

This system needs to work through school proxies and firewalls.
It needs support data integrity. This will be partially handled by the 3rd system.
It should support sending data to and recieving data from a server.
flexible enough to support security additions so it is safe to use it in an unsecure environment.

dependency mananagement

efficient downloads
efficient server processor and disk usage
supports version dependency so there can be mutiple versions of code and jar files.

users, security, permissions

relatively secure on an unsecure local file system
allows data to be shared while secure server is not available
student results should be secure while in transport
user information should be protected while in transport
external content should be signed so it cannot be replaced by "bad" content

An object database might have the dependency management built in. But this needs to linked with the transfer code, so probably we'll need our own dependency management.

I originally thought we would need a full ACL(access control) type system, so I started working on an elaborate key system to support this. Here was the first pass at that. However now I think we don't NEED that. It would be nice but probably all we really need is what I listed above: signed content, and encrypted results. Data shared between users can be unencrypted but signed by the creator. For p2p data sharing there won't be a way to keep student A from replacing students B data. But a user would know the data wasn't created by B because there is no way for A to sign it. So in this system only the users public keys need to be protected. And that can be handled if the list of keys are signed by a trusted key.

- formalize these ideas and compare it to exis*ting palm and pocketpc technologies. Why do we need our own syncing strategy? What are the essential differences.

Document generated by Confluence on Jan 27, 2014 16:43